Overview

Dataset statistics

Number of variables12
Number of observations99003
Missing cells11448
Missing cells (%)1.0%
Duplicate rows0
Duplicate rows (%)0.0%
Total size in memory8.4 MiB
Average record size in memory89.0 B

Variable types

Numeric10
Categorical2

Warnings

age is highly correlated with age_cleanHigh correlation
likes_received is highly correlated with mobile_likes_receivedHigh correlation
mobile_likes_received is highly correlated with likes_receivedHigh correlation
age_clean is highly correlated with ageHigh correlation
age_clean has 5723 (5.8%) missing values Missing
age_Group has 5723 (5.8%) missing values Missing
likes_received is highly skewed (γ1 = 112.0745682) Skewed
mobile_likes_received is highly skewed (γ1 = 107.5312999) Skewed
userid has unique values Unique
friend_count has 1962 (2.0%) zeros Zeros
friendships_initiated has 2997 (3.0%) zeros Zeros
likes has 22308 (22.5%) zeros Zeros
likes_received has 24428 (24.7%) zeros Zeros
mobile_likes has 35056 (35.4%) zeros Zeros
mobile_likes_received has 30003 (30.3%) zeros Zeros

Reproduction

Analysis started2021-04-25 06:21:03.998476
Analysis finished2021-04-25 06:21:57.699678
Duration53.7 seconds
Software versionpandas-profiling v2.11.0
Download configurationconfig.yaml

Variables

userid
Real number (ℝ≥0)

UNIQUE

Distinct99003
Distinct (%)100.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1597045.208
Minimum1000008
Maximum2193542
Zeros0
Zeros (%)0.0%
Memory size773.6 KiB
2021-04-25T11:51:57.959070image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum1000008
5-th percentile1060618.3
Q11298805.5
median1596148
Q31895744
95-th percentile2133357.1
Maximum2193542
Range1193534
Interquartile range (IQR)596938.5

Descriptive statistics

Standard deviation344059.1775
Coefficient of variation (CV)0.2154348391
Kurtosis-1.199556831
Mean1597045.208
Median Absolute Deviation (MAD)298438
Skewness0.0001076605667
Sum1.581122667 × 1011
Variance1.183767176 × 1011
MonotocityNot monotonic
2021-04-25T11:51:58.456584image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
11592241
 
< 0.1%
11292021
 
< 0.1%
10555101
 
< 0.1%
18552271
 
< 0.1%
21103691
 
< 0.1%
19914491
 
< 0.1%
21286661
 
< 0.1%
18843351
 
< 0.1%
20821231
 
< 0.1%
10268481
 
< 0.1%
Other values (98993)98993
> 99.9%
ValueCountFrequency (%)
10000081
< 0.1%
10000131
< 0.1%
10000151
< 0.1%
10000381
< 0.1%
10000591
< 0.1%
ValueCountFrequency (%)
21935421
< 0.1%
21935381
< 0.1%
21935221
< 0.1%
21934991
< 0.1%
21934851
< 0.1%

age
Real number (ℝ≥0)

HIGH CORRELATION

Distinct101
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean37.28022383
Minimum13
Maximum113
Zeros0
Zeros (%)0.0%
Memory size773.6 KiB
2021-04-25T11:51:58.911850image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile15
Q120
median28
Q350
95-th percentile90
Maximum113
Range100
Interquartile range (IQR)30

Descriptive statistics

Standard deviation22.58974831
Coefficient of variation (CV)0.6059445462
Kurtosis1.561446767
Mean37.28022383
Median Absolute Deviation (MAD)10
Skewness1.415260654
Sum3690854
Variance510.2967289
MonotocityNot monotonic
2021-04-25T11:51:59.287474image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
185196
 
5.2%
234404
 
4.4%
194391
 
4.4%
203769
 
3.8%
213671
 
3.7%
253641
 
3.7%
173283
 
3.3%
163086
 
3.1%
223032
 
3.1%
242827
 
2.9%
Other values (91)61703
62.3%
ValueCountFrequency (%)
13484
 
0.5%
141925
1.9%
152618
2.6%
163086
3.1%
173283
3.3%
ValueCountFrequency (%)
113202
0.2%
11218
 
< 0.1%
11118
 
< 0.1%
11015
 
< 0.1%
1099
 
< 0.1%

gender
Categorical

Distinct3
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size773.6 KiB
male
58574 
female
40254 
no gender
 
175

Length

Max length9
Median length4
Mean length4.822025595
Min length4

Characters and Unicode

Total characters477395
Distinct characters11
Distinct categories2 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowmale
2nd rowfemale
3rd rowmale
4th rowfemale
5th rowmale
ValueCountFrequency (%)
male58574
59.2%
female40254
40.7%
no gender175
 
0.2%
2021-04-25T11:52:00.065678image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-25T11:52:00.293060image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
male58574
59.1%
female40254
40.6%
no175
 
0.2%
gender175
 
0.2%

Most occurring characters

ValueCountFrequency (%)
e139432
29.2%
m98828
20.7%
a98828
20.7%
l98828
20.7%
f40254
 
8.4%
n350
 
0.1%
o175
 
< 0.1%
175
 
< 0.1%
g175
 
< 0.1%
d175
 
< 0.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter477220
> 99.9%
Space Separator175
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
e139432
29.2%
m98828
20.7%
a98828
20.7%
l98828
20.7%
f40254
 
8.4%
n350
 
0.1%
o175
 
< 0.1%
g175
 
< 0.1%
d175
 
< 0.1%
r175
 
< 0.1%
ValueCountFrequency (%)
175
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin477220
> 99.9%
Common175
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
e139432
29.2%
m98828
20.7%
a98828
20.7%
l98828
20.7%
f40254
 
8.4%
n350
 
0.1%
o175
 
< 0.1%
g175
 
< 0.1%
d175
 
< 0.1%
r175
 
< 0.1%
ValueCountFrequency (%)
175
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII477395
100.0%

Most frequent character per block

ValueCountFrequency (%)
e139432
29.2%
m98828
20.7%
a98828
20.7%
l98828
20.7%
f40254
 
8.4%
n350
 
0.1%
o175
 
< 0.1%
175
 
< 0.1%
g175
 
< 0.1%
d175
 
< 0.1%

tenure
Real number (ℝ≥0)

Distinct2426
Distinct (%)2.5%
Missing2
Missing (%)< 0.1%
Infinite0
Infinite (%)0.0%
Mean537.8873749
Minimum0
Maximum3139
Zeros70
Zeros (%)0.1%
Memory size773.6 KiB
2021-04-25T11:52:00.637207image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile47
Q1226
median412
Q3675
95-th percentile1575
Maximum3139
Range3139
Interquartile range (IQR)449

Descriptive statistics

Standard deviation457.6498739
Coefficient of variation (CV)0.8508284359
Kurtosis2.199058275
Mean537.8873749
Median Absolute Deviation (MAD)213
Skewness1.535680925
Sum53251388
Variance209443.4071
MonotocityNot monotonic
2021-04-25T11:52:00.963880image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
300173
 
0.2%
303170
 
0.2%
242164
 
0.2%
272163
 
0.2%
257161
 
0.2%
297161
 
0.2%
285160
 
0.2%
280160
 
0.2%
284158
 
0.2%
278158
 
0.2%
Other values (2416)97373
98.4%
ValueCountFrequency (%)
070
0.1%
160
0.1%
272
0.1%
379
0.1%
486
0.1%
ValueCountFrequency (%)
31393
< 0.1%
31291
 
< 0.1%
31281
 
< 0.1%
31011
 
< 0.1%
30191
 
< 0.1%

friend_count
Real number (ℝ≥0)

ZEROS

Distinct2562
Distinct (%)2.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean196.3507873
Minimum0
Maximum4923
Zeros1962
Zeros (%)2.0%
Memory size773.6 KiB
2021-04-25T11:52:01.338094image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile3
Q131
median82
Q3206
95-th percentile720
Maximum4923
Range4923
Interquartile range (IQR)175

Descriptive statistics

Standard deviation387.304229
Coefficient of variation (CV)1.972511719
Kurtosis50.09427289
Mean196.3507873
Median Absolute Deviation (MAD)64
Skewness6.059008484
Sum19439317
Variance150004.5658
MonotocityNot monotonic
2021-04-25T11:52:02.053721image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
01962
 
2.0%
11816
 
1.8%
21117
 
1.1%
3860
 
0.9%
5789
 
0.8%
4749
 
0.8%
10737
 
0.7%
24732
 
0.7%
6720
 
0.7%
29719
 
0.7%
Other values (2552)88802
89.7%
ValueCountFrequency (%)
01962
2.0%
11816
1.8%
21117
1.1%
3860
0.9%
4749
 
0.8%
ValueCountFrequency (%)
49231
< 0.1%
49171
< 0.1%
48631
< 0.1%
48451
< 0.1%
48441
< 0.1%

friendships_initiated
Real number (ℝ≥0)

ZEROS

Distinct1519
Distinct (%)1.5%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean107.4524711
Minimum0
Maximum4144
Zeros2997
Zeros (%)3.0%
Memory size773.6 KiB
2021-04-25T11:52:02.460507image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile1
Q117
median46
Q3117
95-th percentile418
Maximum4144
Range4144
Interquartile range (IQR)100

Descriptive statistics

Standard deviation188.786951
Coefficient of variation (CV)1.756934475
Kurtosis42.53560096
Mean107.4524711
Median Absolute Deviation (MAD)36
Skewness5.150757415
Sum10638117
Variance35640.51287
MonotocityNot monotonic
2021-04-25T11:52:02.818875image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02997
 
3.0%
12212
 
2.2%
21551
 
1.6%
31355
 
1.4%
41352
 
1.4%
61328
 
1.3%
51328
 
1.3%
111319
 
1.3%
81314
 
1.3%
131279
 
1.3%
Other values (1509)82968
83.8%
ValueCountFrequency (%)
02997
3.0%
12212
2.2%
21551
1.6%
31355
1.4%
41352
1.4%
ValueCountFrequency (%)
41441
< 0.1%
36541
< 0.1%
35941
< 0.1%
35381
< 0.1%
34151
< 0.1%

likes
Real number (ℝ≥0)

ZEROS

Distinct2924
Distinct (%)3.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean156.0787855
Minimum0
Maximum25111
Zeros22308
Zeros (%)22.5%
Memory size773.6 KiB
2021-04-25T11:52:03.202475image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median11
Q381
95-th percentile726
Maximum25111
Range25111
Interquartile range (IQR)80

Descriptive statistics

Standard deviation572.2806808
Coefficient of variation (CV)3.666614134
Kurtosis200.4456878
Mean156.0787855
Median Absolute Deviation (MAD)11
Skewness11.02370356
Sum15452268
Variance327505.1777
MonotocityNot monotonic
2021-04-25T11:52:03.581923image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
022308
22.5%
16928
 
7.0%
24434
 
4.5%
33240
 
3.3%
42507
 
2.5%
52027
 
2.0%
61806
 
1.8%
71618
 
1.6%
81430
 
1.4%
91381
 
1.4%
Other values (2914)51324
51.8%
ValueCountFrequency (%)
022308
22.5%
16928
 
7.0%
24434
 
4.5%
33240
 
3.3%
42507
 
2.5%
ValueCountFrequency (%)
251111
< 0.1%
216521
< 0.1%
167321
< 0.1%
165831
< 0.1%
147991
< 0.1%

likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2681
Distinct (%)2.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean142.6893629
Minimum0
Maximum261197
Zeros24428
Zeros (%)24.7%
Memory size773.6 KiB
2021-04-25T11:52:03.983643image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q11
median8
Q359
95-th percentile561
Maximum261197
Range261197
Interquartile range (IQR)58

Descriptive statistics

Standard deviation1387.919613
Coefficient of variation (CV)9.726861091
Kurtosis17384.94
Mean142.6893629
Median Absolute Deviation (MAD)8
Skewness112.0745682
Sum14126675
Variance1926320.851
MonotocityNot monotonic
2021-04-25T11:52:04.351305image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
024428
24.7%
17305
 
7.4%
24541
 
4.6%
33347
 
3.4%
42669
 
2.7%
52373
 
2.4%
61873
 
1.9%
71680
 
1.7%
81538
 
1.6%
91351
 
1.4%
Other values (2671)47898
48.4%
ValueCountFrequency (%)
024428
24.7%
17305
 
7.4%
24541
 
4.6%
33347
 
3.4%
42669
 
2.7%
ValueCountFrequency (%)
2611971
< 0.1%
1781661
< 0.1%
1520141
< 0.1%
1060251
< 0.1%
826231
< 0.1%

mobile_likes
Real number (ℝ≥0)

ZEROS

Distinct2396
Distinct (%)2.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean106.1162995
Minimum0
Maximum25111
Zeros35056
Zeros (%)35.4%
Memory size773.6 KiB
2021-04-25T11:52:04.708462image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q346
95-th percentile481.9
Maximum25111
Range25111
Interquartile range (IQR)46

Descriptive statistics

Standard deviation445.2529851
Coefficient of variation (CV)4.195896268
Kurtosis360.9885806
Mean106.1162995
Median Absolute Deviation (MAD)4
Skewness14.16123656
Sum10505832
Variance198250.2207
MonotocityNot monotonic
2021-04-25T11:52:05.066752image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
035056
35.4%
16297
 
6.4%
23941
 
4.0%
32917
 
2.9%
42265
 
2.3%
51794
 
1.8%
61598
 
1.6%
71395
 
1.4%
81212
 
1.2%
91149
 
1.2%
Other values (2386)41379
41.8%
ValueCountFrequency (%)
035056
35.4%
16297
 
6.4%
23941
 
4.0%
32917
 
2.9%
42265
 
2.3%
ValueCountFrequency (%)
251111
< 0.1%
216521
< 0.1%
167321
< 0.1%
140391
< 0.1%
135291
< 0.1%

mobile_likes_received
Real number (ℝ≥0)

HIGH CORRELATION
SKEWED
ZEROS

Distinct2004
Distinct (%)2.0%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean84.1204913
Minimum0
Maximum138561
Zeros30003
Zeros (%)30.3%
Memory size773.6 KiB
2021-04-25T11:52:05.409277image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum0
5-th percentile0
Q10
median4
Q333
95-th percentile317
Maximum138561
Range138561
Interquartile range (IQR)33

Descriptive statistics

Standard deviation839.8894437
Coefficient of variation (CV)9.984362083
Kurtosis15522.64932
Mean84.1204913
Median Absolute Deviation (MAD)4
Skewness107.5312999
Sum8328181
Variance705414.2777
MonotocityNot monotonic
2021-04-25T11:52:05.747589image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
030003
30.3%
18243
 
8.3%
24948
 
5.0%
33608
 
3.6%
42944
 
3.0%
52383
 
2.4%
62022
 
2.0%
71745
 
1.8%
81521
 
1.5%
91437
 
1.5%
Other values (1994)40149
40.6%
ValueCountFrequency (%)
030003
30.3%
18243
 
8.3%
24948
 
5.0%
33608
 
3.6%
42944
 
3.0%
ValueCountFrequency (%)
1385611
< 0.1%
1312441
< 0.1%
899111
< 0.1%
733331
< 0.1%
434101
< 0.1%

age_clean
Real number (ℝ≥0)

HIGH CORRELATION
MISSING

Distinct68
Distinct (%)0.1%
Missing5723
Missing (%)5.8%
Infinite0
Infinite (%)0.0%
Mean33.37590051
Minimum13
Maximum80
Zeros0
Zeros (%)0.0%
Memory size773.6 KiB
2021-04-25T11:52:06.174927image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Quantile statistics

Minimum13
5-th percentile15
Q120
median27
Q345
95-th percentile66
Maximum80
Range67
Interquartile range (IQR)25

Descriptive statistics

Standard deviation16.54458902
Coefficient of variation (CV)0.495704648
Kurtosis-0.3645369504
Mean33.37590051
Median Absolute Deviation (MAD)9
Skewness0.884072507
Sum3113304
Variance273.7234257
MonotocityNot monotonic
2021-04-25T11:52:06.535939image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
185196
 
5.2%
234404
 
4.4%
194391
 
4.4%
203769
 
3.8%
213671
 
3.7%
253641
 
3.7%
173283
 
3.3%
163086
 
3.1%
223032
 
3.1%
242827
 
2.9%
Other values (58)55980
56.5%
(Missing)5723
 
5.8%
ValueCountFrequency (%)
13484
 
0.5%
141925
1.9%
152618
2.6%
163086
3.1%
173283
3.3%
ValueCountFrequency (%)
80136
0.1%
79112
0.1%
78162
0.2%
77169
0.2%
76178
0.2%

age_Group
Categorical

MISSING

Distinct5
Distinct (%)< 0.1%
Missing5723
Missing (%)5.8%
Memory size97.0 KiB
18-24
25735 
45+
22989 
25-34
18639 
<18
16592 
35-44
9325 

Length

Max length5
Median length5
Mean length4.151350772
Min length3

Characters and Unicode

Total characters387238
Distinct characters9
Distinct categories3 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row<18
2nd row<18
3rd row<18
4th row<18
5th row<18
ValueCountFrequency (%)
18-2425735
26.0%
45+22989
23.2%
25-3418639
18.8%
<1816592
16.8%
35-449325
 
9.4%
(Missing)5723
 
5.8%
2021-04-25T11:52:07.358629image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Histogram of lengths of the category
2021-04-25T11:52:07.596114image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
ValueCountFrequency (%)
18-2425735
27.6%
4522989
24.6%
25-3418639
20.0%
1816592
17.8%
35-449325
 
10.0%

Most occurring characters

ValueCountFrequency (%)
486013
22.2%
-53699
13.9%
550953
13.2%
244374
11.5%
142327
10.9%
842327
10.9%
327964
 
7.2%
+22989
 
5.9%
<16592
 
4.3%

Most occurring categories

ValueCountFrequency (%)
Decimal Number293958
75.9%
Dash Punctuation53699
 
13.9%
Math Symbol39581
 
10.2%

Most frequent character per category

ValueCountFrequency (%)
486013
29.3%
550953
17.3%
244374
15.1%
142327
14.4%
842327
14.4%
327964
 
9.5%
ValueCountFrequency (%)
+22989
58.1%
<16592
41.9%
ValueCountFrequency (%)
-53699
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common387238
100.0%

Most frequent character per script

ValueCountFrequency (%)
486013
22.2%
-53699
13.9%
550953
13.2%
244374
11.5%
142327
10.9%
842327
10.9%
327964
 
7.2%
+22989
 
5.9%
<16592
 
4.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII387238
100.0%

Most frequent character per block

ValueCountFrequency (%)
486013
22.2%
-53699
13.9%
550953
13.2%
244374
11.5%
142327
10.9%
842327
10.9%
327964
 
7.2%
+22989
 
5.9%
<16592
 
4.3%

Interactions

2021-04-25T11:51:18.582835image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:18.934403image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:19.378275image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:19.957963image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:20.383754image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:20.784542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:21.204337image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:21.608787image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:22.067037image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:22.558666image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:22.964680image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:23.351158image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:23.737500image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:24.094820image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:24.466762image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:24.744660image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:24.966621image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:25.199677image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:25.617543image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:25.946871image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:26.334143image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:26.752607image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:27.152200image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:27.547851image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:27.947231image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:28.366856image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:28.802431image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:29.216706image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:29.637309image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:30.038426image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:30.451471image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:30.844983image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:31.239980image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:31.818947image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:32.194067image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:32.626882image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:33.049541image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:33.450412image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:33.859156image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:34.267418image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:34.675864image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:35.059724image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:35.466504image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:35.847361image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:36.273844image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:36.707066image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:37.134523image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:37.479867image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:37.812364image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:38.197145image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:38.494975image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:38.869620image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:39.256049image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:39.704175image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:40.121553image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:40.546384image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:40.773307image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:41.056251image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:41.336839image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:41.702837image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:42.053698image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:42.454676image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:42.908890image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:43.335354image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:43.744949image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:44.125542image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:44.531236image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:44.923653image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:45.308096image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:45.686374image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:46.286027image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:46.707961image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:47.107948image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:47.551566image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:47.958614image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:48.386731image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:48.809449image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:49.208941image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:49.617043image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:50.038916image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:50.477833image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:50.918382image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:51.340518image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:51.793343image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:52.205623image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:52.610376image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:53.018079image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:53.442112image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:53.891635image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
2021-04-25T11:51:54.319640image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Correlations

2021-04-25T11:52:07.946926image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
2021-04-25T11:52:08.632281image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
2021-04-25T11:52:09.247802image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
2021-04-25T11:52:09.917394image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.
2021-04-25T11:52:10.477035image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

2021-04-25T11:51:55.133282image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
A simple visualization of nullity by column.
2021-04-25T11:51:56.074298image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
2021-04-25T11:51:56.905885image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
2021-04-25T11:51:57.183009image/svg+xmlMatplotlib v3.3.2, https://matplotlib.org/
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

useridagegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedage_cleanage_Group
0209438214male266.000000014.0<18
1119260114female6.000000014.0<18
2208388414male13.000000014.0<18
3120316814female93.000000014.0<18
4173318614male82.000000014.0<18
5152476514male15.000000014.0<18
6113613313male12.000000013.0<18
7168036113female0.000000013.0<18
8136517413male81.000000013.0<18
9171256713male171.000000013.0<18

Last rows

useridagegendertenurefriend_countfriendships_initiatedlikeslikes_receivedmobile_likesmobile_likes_receivedage_cleanage_Group
98993165456519male394.0453841444501150884435596119.018-24
98994206300620female402.01988332735110602572487333320.018-24
98995113216420female699.03611973450777684414690920.018-24
98996166869524female182.02938127260181776558431170824.018-24
98997145898528female290.0221816184626102684290425028.025-34
98998126829968female541.0211834139961808935051188768.045+
98999125615318female21.01968172044011341243991059218.0<18
99000119594315female111.0200215241195912554119591146215.0<18
99001146802323female416.02560185450665164506576023.018-24
99002139789639female397.020497689410124439410953039.035-44